Random Search for Hyper-Parameter Optimization

نویسندگان

  • James Bergstra
  • Yoshua Bengio
چکیده

Many machine learning algorithms have hyperparameters flags, values, and other configuration information that guides the algorithm. Sometimes this configuration applies to the space of functions that the learning algorithm searches (e.g. the number of nearest neighbours to use in KNN). Sometimes this configuration applies to the way in which the search is conducted (e.g. the step size in stochastic gradient descent). For better or for worse, it is common practice to judge a learning algorithm by its best-casescenario performance. Researchers are expected to maximize the performance of their algorithm by optimizing over hyper-parameter values by e.g. cross-validating using data withheld from the training set. Despite decades of research into global optimization (e.g. [8, 4, 9, 10]) and the publishing of several hyper-parameter optimization algorithms (e.g. [7, 1, 3]), it would seem that most machine learning researchers still prefer to carry out this optimization by hand, and by grid search (e.g. [6, 5, 2]). Here, we argue that in theory and experiment grid search (i.e. lattice-based brute force search) should almost never be used. Instead, quasirandom or even pseudo-random experiment designs (random experiments) should be preferred. Random experiments are just as easily parallelized as grid search, just as simple to design, and more reliable. Looking forward, we would like to investigate sequential hyper-parameter optimization algorithms and we hope that random search will serve as a credible baseline. Does random search work better? We did an experiment (Fig. 1) similar to [5] using random search instead of grid search. We op1 2 4 8 16 32 # trials 0.0 0.2 0.4 0.6 0.8 1.0

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Random Sampling on SVM Hyper-parameter Tuning

Hyper-parameter tuning is one of the crucial steps in the successful application of machine learning algorithms to real data. In general, the tuning process is modeled as an optimization problem for which several methods have been proposed. For complex algorithms, the evaluation of a hyper-parameter configuration is expensive and their runtime is speed up through data sampling. In this paper, t...

متن کامل

Algorithms for Hyper-Parameter Optimization

Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors m...

متن کامل

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...

متن کامل

Hyper-heuristics Can Achieve Optimal Performance for Pseudo-Boolean Optimisation

Selection hyper-heuristics are randomised search methodologies which choose and execute heuristics from a set of low-level heuristics. Recent research for the LeadingOnes benchmark function has shown that the standard Simple Random, Permutation, Random Gradient, Greedy and Reinforcement Learning selection mechanisms show no effects of learning. The idea behind the learning mechanisms is to cont...

متن کامل

Critical Hyper-Parameters: No Random, No Cry

The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, “one-shot” optimization schemes – where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel – are commonly used. [1] show that grid search is su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2012